Batch processing

Batch processing is execution of a series of programs ("jobs") on a computer without manual intervention.

Batch jobs are set up so they can be run to completion without manual intervention, so all input data is preselected through scripts or command-line parameters. This is in contrast to "online" or interactive programs which prompt the user for such input. A program takes a set of data files as input, processes the data, and produces a set of output data files. This operating environment is termed as "batch processing" because the input data are collected into batches of files and are processed in batches by the program.

Contents

Benefits

Batch processing has these benefits:

History

Batch processing has been associated with mainframe computers since the earliest days of electronic computing in the 1950s. There were a variety of reasons why batch processing dominated early computing. One reason is that the most urgent business problems for reasons of profitability and competitiveness were primarily accounting problems, such as billing. Billing is inherently a batch-oriented business process, and practically every business must bill, reliably and on-time. Also, every computing resource was expensive, so sequential submission of batch jobs matched the resource constraints and technology evolution at the time. Later, interactive sessions with either text-based computer terminal interfaces or graphical user interfaces became more common. However, computers initially were not even capable of having multiple programs loaded into the main memory.

Batch processing is still pervasive in mainframe computing, but practically all types of computers are now capable of at least some batch processing, even if only for "housekeeping" tasks. That includes UNIX-based computers, Microsoft Windows, Mac OS X, and even smartphones, increasingly. Virus scanning is a form of batch processing, and so are scheduled jobs that periodically delete temporary files that are no longer required. E-mail systems frequently have batch jobs that periodically archive and compress old messages. As computing in general becomes more pervasive in society and in the world, so too will batch processing.

Modern systems

Despite their long history, batch applications are still critical in most organizations in large part because many core business processes are inherently batch-oriented. (Billing is a notable example that nearly every business requires to function.) While online systems can also function when manual intervention is not desired, they are not typically optimized to perform high-volume, repetitive tasks. Therefore, even new systems usually contain one or more batch applications for updating information at the end of the day, generating reports, printing documents, and other non-interactive tasks that must complete reliably within certain business deadlines.

Modern batch applications make use of modern batch frameworks such as Spring Batch, which is written for Java, and other frameworks for other programming languages, to provide the fault tolerance and scalability required for high-volume processing. In order to ensure high-speed processing, batch applications are often integrated with grid computing solutions to partition a batch job over a large number of processors, although there are significant programming challenges in doing so. High volume batch processing places particularly heavy demands on system and application architectures as well. Architectures that feature strong input/output performance and vertical scalability, including modern mainframe computers, tend to provide better batch performance than alternatives.

Scripting languages became popular as they evolved along with batch processing.

Common batch processing usage

Data processing

A typical batch processing schedule includes end of day- reporting (EOD). Historically, many systems had a batch window where online subsystems were turned off and the system capacity was used to run jobs common to all data (accounts, users, or customers) on a system. In a bank, for example, EOD jobs include interest calculation, generation of reports and data sets to other systems, printing (statements), and payment processing. Many businesses have moved to concurrent online and batch architectures in order to support globalization, the Internet, and other relatively newer business demands. Such architectures place unique stresses on system design, programming techniques, availability engineering, and IT service delivery.

Databases

Batch processing is also used for efficient bulk database updates and automated transaction processing, as contrasted to interactive online transaction processing (OLTP) applications. The extract, transform, load (ETL) step in populating data warehouses is inherently a batch process in most implementations.

Images

Batch processing is often used to perform various operations with digital images. Computer programs exist that let one resize, convert, watermark, or otherwise edit image files.

Converting

Batch processing is also used for converting a number of computer files from one format to another. This is to make files portable and versatile especially for proprietary and legacy files where viewers are not easy to come by.

Notable batch scheduling and execution environments

UNIX utilizes cron and at facilities to allow for scheduling of complex job scripts. Windows has a job scheduler. Most high-performance computing clusters use batch processing to maximize cluster usage.

IBM's z/OS has arguably the most highly refined and evolved set of batch processing facilities owing to its origins, long history, and continuing evolution, and today such systems commonly support hundreds or even thousands of concurrent online and batch tasks within a single operating system image. Mainframe-unique technologies that aid concurrent batch and online processing include Job Control Language (JCL), scripting languages such as REXX, Job Entry Subsystem (JES2 and JES3), Workload Manager (WLM), Automatic Restart Manager (ARM), Resource Recovery Services (RRS), DB2 data sharing, Parallel Sysplex, unique performance optimizations such as HiperDispatch, I/O channel architecture, and several others.

See also

External links